This report leverages the mtcars dataset to address two key questions:
Comparative Analysis of Transmission Types: Determine whether vehicles with automatic transmissions exhibit better fuel efficiency (measured in miles per gallon, MPG) compared to those with manual transmissions. Quantification of MPG Differences: Precisely measure and analyze the MPG difference between automatic and manual transmissions.
The mtcars dataset contains 32 observations of 11 variables:
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
## NULL
| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. :10.40 | Min. :4.000 | Min. : 71.1 | Min. : 52.0 | Min. :2.760 | Min. :1.513 | Min. :14.50 | Min. :0.0000 | Min. :0.0000 | Min. :3.000 | Min. :1.000 | |
| 1st Qu.:15.43 | 1st Qu.:4.000 | 1st Qu.:120.8 | 1st Qu.: 96.5 | 1st Qu.:3.080 | 1st Qu.:2.581 | 1st Qu.:16.89 | 1st Qu.:0.0000 | 1st Qu.:0.0000 | 1st Qu.:3.000 | 1st Qu.:2.000 | |
| Median :19.20 | Median :6.000 | Median :196.3 | Median :123.0 | Median :3.695 | Median :3.325 | Median :17.71 | Median :0.0000 | Median :0.0000 | Median :4.000 | Median :2.000 | |
| Mean :20.09 | Mean :6.188 | Mean :230.7 | Mean :146.7 | Mean :3.597 | Mean :3.217 | Mean :17.85 | Mean :0.4375 | Mean :0.4062 | Mean :3.688 | Mean :2.812 | |
| 3rd Qu.:22.80 | 3rd Qu.:8.000 | 3rd Qu.:326.0 | 3rd Qu.:180.0 | 3rd Qu.:3.920 | 3rd Qu.:3.610 | 3rd Qu.:18.90 | 3rd Qu.:1.0000 | 3rd Qu.:1.0000 | 3rd Qu.:4.000 | 3rd Qu.:4.000 | |
| Max. :33.90 | Max. :8.000 | Max. :472.0 | Max. :335.0 | Max. :4.930 | Max. :5.424 | Max. :22.90 | Max. :1.0000 | Max. :1.0000 | Max. :5.000 | Max. :8.000 |
The methodology for this report involves a systemic approach. The code generates a report of a series of potential models, adding one variable at a time and subsequently comparing their statistical outcomes. This preliminary comparative report allows for previewing and identifying the model that best fits the data, ensuring robust and accurate predictions.
The table below presents all the models under consideration. Each model in the table was fitted using the R function lm() and evaluated using the corresponding “summary(lm)” output. The models are assessed based on the following criteria:
| Model | Adj_R_Squared | AIC | BIC | SignificatPredictors | ModelNUmber |
|---|---|---|---|---|---|
| mpg~cyl | 0.7170527 | 169.3064 | 173.7036 | 1 | 1 |
| mpg~disp | 0.7089548 | 170.2094 | 174.6066 | 1 | 2 |
| mpg~ cyl+disp | 0.7429841 | 167.1456 | 173.0086 | 2 | 3 |
| mpg~hp | 0.5891853 | 181.2386 | 185.6358 | 1 | 4 |
| mpg~ cyl+disp + hp | 0.7430186 | 168.0184 | 175.3471 | 1 | 5 |
| mpg~drat | 0.4461283 | 190.7999 | 195.1971 | 1 | 6 |
| mpg~ cyl+disp + hp + drat | 0.7502914 | 167.9360 | 176.7304 | 1 | 7 |
| mpg~wt | 0.7445939 | 166.0294 | 170.4266 | 1 | 8 |
| mpg~ cyl+disp + hp + drat + wt | 0.8227219 | 157.7659 | 168.0260 | 2 | 9 |
| mpg~qsec | 0.1478062 | 204.5881 | 208.9853 | 1 | 10 |
| mpg~ cyl+disp + hp + drat + wt + qsec | 0.8199798 | 159.0020 | 170.7279 | 1 | 11 |
| mpg~vs | 0.4223126 | 192.1471 | 196.5443 | 1 | 12 |
| mpg~ cyl+disp + hp + drat + wt + qsec + vs | 0.8126278 | 160.9766 | 174.1682 | 1 | 13 |
| mpg~am | 0.3384589 | 196.4844 | 200.8816 | 1 | 14 |
| mpg~ cyl+disp + hp + drat + wt + qsec + vs + am | 0.8218062 | 160.0075 | 174.6648 | 1 | 15 |
| mpg~gear | 0.2050292 | 202.3638 | 206.7611 | 1 | 16 |
| mpg~ cyl+disp + hp + drat + wt + qsec + vs + am + gear | 0.8149224 | 161.7979 | 177.9210 | 1 | 17 |
| Model | Adj_R_Squared | AIC | BIC | SignificatPredictors | ModelNUmber |
|---|---|---|---|---|---|
| mpg~ cyl+disp + hp + drat + wt | 0.8227219 | 157.7659 | 168.0260 | 2 | 9 |
| mpg~ cyl+disp + hp + drat + wt + qsec | 0.8199798 | 159.0020 | 170.7279 | 1 | 11 |
| mpg~ cyl+disp + hp + drat + wt + qsec + vs + am | 0.8218062 | 160.0075 | 174.6648 | 1 | 15 |
| mpg~ cyl+disp + hp + drat + wt + qsec + vs + am + gear | 0.8149224 | 161.7979 | 177.9210 | 1 | 17 |
Anova analysis of variance tables for one or more fitted model objects.the Residual Sum of Squares (RRS), is a metric used in regression analysis to measure the variation of the data that is not explained by the model. It represents the difference between the observed values and the predicted values. A small R-Anova value indicates a good fit between the model and the data, suggesting that most of the variation is explained by the factors included in the model.
Variance Indicator Factor (VIF) VIF values help to identify multicollinearity among predictors.Multicollinearity occurs when two or more predictors are highly correlated, leading to unstable estimates of regression coefficients.Values above 10 indicate problematic multicollinearity.
Root Mean Squared Error, RMSE is a measure of how well the model’s predictions match the actual values. Lower RMSE indicates better model performance.
Residual Plots: Diagnostic plots help to check the assumptions of linear regression, including linearity, homoscedasticity, and normality of residuals.
## [1] "ANOVA"
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp + hp + drat + wt
## Model 2: mpg ~ cyl + disp + hp + drat + wt + qsec
## Model 3: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am
## Model 4: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 167.43
## 2 25 163.48 1 3.9493 0.5875 0.4516
## 3 23 148.87 2 14.6040 1.0862 0.3549
## 4 22 147.90 1 0.9717 0.1445 0.7075
## [1] "Variance Indicator Factor VIF"
## [[1]]
## cyl disp hp drat wt
## 7.869010 10.463957 3.990380 2.662298 5.168795
##
## [[2]]
## cyl disp hp drat wt qsec
## 9.958978 10.550573 5.357783 2.966519 7.181690 4.039701
##
## [[3]]
## cyl disp hp drat wt qsec vs am
## 13.347224 10.646573 5.931238 3.122224 7.599975 6.635692 4.923095 4.162232
##
## [[4]]
## cyl disp hp drat wt qsec vs am
## 14.573542 11.783934 7.105430 3.230897 7.838669 6.984654 4.923203 4.630597
## gear
## 4.392711
## [1] "RMSE"
## [[1]]
## [1] 2.287371
##
## [[2]]
## [1] 2.260232
##
## [[3]]
## [1] 2.156913
##
## [[4]]
## [1] 2.149863
In the analysis of variance, at first glance model 1 seems to be significant model. However, cyl and dis show high multicollinearity in all models, that ia also confirmed by the RMS. Additionally, the high RRS values suggests a better fit model exist. To address the issue, cyl and disp are removed from all 4 models and then are compared with first set.
## [1] "ANOVA"
## Analysis of Variance Table
##
## Model 1: mpg ~ hp + drat + wt
## Model 2: mpg ~ hp + drat + wt + qsec
## Model 3: mpg ~ hp + drat + wt + qsec + vs + am
## Model 4: mpg ~ hp + drat + wt + qsec + vs + am + gear
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 183.68
## 2 27 174.10 1 9.5784 1.4499 0.2403
## 3 25 158.56 2 15.5448 1.1765 0.3255
## 4 24 158.56 1 0.0035 0.0005 0.9819
## [1] "Variance Indicator Factor VIF - Corrected Models"
## [[1]]
## hp drat wt
## 1.769308 2.033837 2.869445
##
## [[2]]
## hp drat wt qsec
## 4.921958 2.035473 3.582683 2.876115
##
## [[3]]
## hp drat wt qsec vs am
## 5.070665 2.709905 5.105979 5.776361 4.120656 3.272177
##
## [[4]]
## hp drat wt qsec vs am gear
## 5.364885 3.028679 5.135893 5.794930 4.253778 4.257110 3.452507
## [1] "RMSE-Corrected Models"
## [[1]]
## [1] 2.395842
##
## [[2]]
## [1] 2.332538
##
## [[3]]
## [1] 2.225974
##
## [[4]]
## [1] 2.225949
Despite of the lack of significant improvemnts, the results hits the a rate describing force with respect to time. Which relates to the equation of power.
Exclusion of the variables “cyl” and “disp” does not result in significant improvements in model performance compared to the initial set of models.
Given that power is defined here as the total horsepower of the car (where 1 horsepower equals 735.5 watts, or kg·m²/s²), which reflects the relationship between mass, distance, and time, two new models have been introduced. These additions aim to address the multicollinearity observed in the previous models.
## [1] "ANOVA"
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp + hp + drat + wt
## Model 2: mpg ~ cyl + disp + hp + drat + wt + qsec
## Model 3: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am
## Model 4: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear
## Model 5: mpg ~ (wt + qsec)
## Model 6: mpg ~ (qsec + hp)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 167.43
## 2 25 163.48 1 3.949 0.5875 0.4516
## 3 23 148.87 2 14.604 1.0862 0.3549
## 4 22 147.90 1 0.972 0.1445 0.7075
## 5 29 195.46 -7 -47.563 1.0107 0.4507
## 6 29 408.89 0 -213.430
## [1] "Variance Indicator Factor VIF - All Models"
## [[1]]
## cyl disp hp drat wt
## 7.869010 10.463957 3.990380 2.662298 5.168795
##
## [[2]]
## cyl disp hp drat wt qsec
## 9.958978 10.550573 5.357783 2.966519 7.181690 4.039701
##
## [[3]]
## cyl disp hp drat wt qsec vs am
## 13.347224 10.646573 5.931238 3.122224 7.599975 6.635692 4.923095 4.162232
##
## [[4]]
## cyl disp hp drat wt qsec vs am
## 14.573542 11.783934 7.105430 3.230897 7.838669 6.984654 4.923203 4.630597
## gear
## 4.392711
##
## [[5]]
## hp drat wt
## 1.769308 2.033837 2.869445
##
## [[6]]
## hp drat wt qsec
## 4.921958 2.035473 3.582683 2.876115
##
## [[7]]
## hp drat wt qsec vs am
## 5.070665 2.709905 5.105979 5.776361 4.120656 3.272177
##
## [[8]]
## hp drat wt qsec vs am gear
## 5.364885 3.028679 5.135893 5.794930 4.253778 4.257110 3.452507
##
## [[9]]
## wt qsec
## 1.031487 1.031487
##
## [[10]]
## qsec hp
## 2.006342 2.006342
## [1] "RMSE- All Models"
## [[1]]
## [1] 2.287371
##
## [[2]]
## [1] 2.260232
##
## [[3]]
## [1] 2.156913
##
## [[4]]
## [1] 2.149863
##
## [[5]]
## [1] 2.471485
##
## [[6]]
## [1] 3.574623
It’s important to note that the ANOVA test results align with the Q-Q Residual Plots for each model, showing that there is little statistically significant improvement, particularly in the tails. In most plots, the tails deviate noticeably from the linear regression line. However, Model 5 demonstrates a distinct pattern, where the residuals eventually return to the line. In terms of variance, Model 5 also shows the least multicollinearity according to the VIF results, despite having a significantly lower RMSE compared to the other models. Therefore, model 5 is used to predict and answers the questions proposed in this report.
Now that the linear model is established, it’s beneficial to visualize the data before making predictions.
Based on the graph, there’s minimal difference in MPG between the two transmission types; however, manual transmissions tend to have slightly higher MPG. While it might be tempting to simply observe where the two datasets intersect to determine the difference, this report aims to analyze the overall trend. To achieve this, a simulation using R’s prediction function is employed. The difference in the mean predicted MPG between the two subsets quantifies the fuel efficiency gap between manual and automatic cars.
## [1] "Automatic Transmission"
## Hornet 4 Drive Hornet Sportabout Valiant Duster 360
## 21.580569 18.196114 21.068588 16.443423
## Merc 240D Merc 230 Merc 280 Merc 280C
## 22.227120 25.123713 19.385488 19.943006
## Merc 450SE Merc 450SL Merc 450SLC Cadillac Fleetwood
## 15.368981 17.271134 17.390414 9.951297
## Lincoln Continental Chrysler Imperial Toyota Corona Dodge Challenger
## 8.924276 8.951388 25.896199 17.652896
## AMC Javelin Camaro Z28 Pontiac Firebird
## 18.481530 14.680913 16.179557
## [1] "Manual Transmission"
## Mazda RX4 Mazda RX4 Wag Datsun 710 Fiat 128 Honda Civic
## 21.81511 21.04822 25.32728 26.73215 28.80248
## Toyota Corolla Fiat X1-9 Porsche 914-2 Lotus Europa Ford Pantera L
## 28.97422 27.54022 24.46115 27.81207 17.21749
## Ferrari Dino Maserati Bora Volvo 142E
## 20.16588 15.29122 22.99592
## [1] "Mean Difference"
## [1] 6.089752
Automatic transmissions are more mpg efficient than manual transmissions. by a factor of 6.090 mpg. Which makes sense because automatic transmissions selects the right gear without driver input.